NLP-with-Python | Scikit-Learn , NLTK , Spacy , Gensim , Textblob | Machine Learning library
kandi X-RAY | NLP-with-Python Summary
kandi X-RAY | NLP-with-Python Summary
Scikit-Learn, NLTK, Spacy, Gensim, Textblob and more.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of NLP-with-Python
NLP-with-Python Key Features
NLP-with-Python Examples and Code Snippets
Community Discussions
Trending Discussions on NLP-with-Python
QUESTION
I have been following the following example for using doc2vec for text classification:
I ran this notebook on my datasets and want to apply one of the doc2vec models to a 3rd dataset (eg, the overall dataset the test/train model was built on). I tried:
...ANSWER
Answered 2020-Jan-21 at 06:06A gensim Doc2Vec
model may be saved and loaded using the .save(filepath)
& .load(filepath)
methods. (Using these native-to-gensim methods will work on larger models than plain Python pickling can support, and more-efficiently store some of the larger internal arrays as separate files. (If moving the saved model, be sure to keep this subsidiary files alongside the main file that's at exactly the filepath
location.)
A previously-trained Doc2Vec
model can generate doc-vectors for new texts via the .infer_vector(list_of_words)
method.
Note that the list_of_words
provided to this method should have been preprocessed/tokenized exactly the same as the training data – and any words that weren't present (or sufficiently min_count
frequent) in the training data will be ignored. (At the extreme, this means if you pass in a list_of_words
with no recognized words, all words will be ignored, and you'll get back a randomly-initialized but completely-unimproved-by-inference vector.)
Still, if you're re-evaulating or re-training the downstream predictive models on new data from some new domain, you'd often want to re-train the Doc2Vec
stage as well, with all available data, so that it has a chance to learn new words from new usage contexts. (It's mainly when your training data was extensive & representative, and your new data comes in incrementally and without major shifts in vocabulary/usage/domain, that you'd want to rely on .infer_vector()
.)
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install NLP-with-Python
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page